Considere a base de dados "Ames Housing Dataset". Com o objetivo de desenvolver um modelo de predição do preço do imóvel, desenvolva os itens a seguir e entregue a análise em arquivo do tipo powerpoint ou pdf
import pandas as pd
import numpy as np
import statsmodels
import seaborn
from matplotlib import pyplot as plt
pd.options.display.max_columns = 100
df = pd.read_csv('base_1ah.csv')
print(df.shape)
(1460, 81)
df.head()
| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NaN | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196.0 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NaN | Attchd | 2003.0 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal | 208500 |
| 1 | 2 | 20 | RL | 80.0 | 9600 | Pave | NaN | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0.0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976.0 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal | 181500 |
| 2 | 3 | 60 | RL | 68.0 | 11250 | Pave | NaN | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162.0 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001.0 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal | 223500 |
| 3 | 4 | 70 | RL | 60.0 | 9550 | Pave | NaN | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0.0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998.0 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml | 140000 |
| 4 | 5 | 60 | RL | 84.0 | 14260 | Pave | NaN | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350.0 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000.0 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal | 250000 |
df = df.set_index('Id')
1.1 Estatísticas descritivas: frequência, proporção, média (𝑥¯), desvio padrão (𝑠), quartis (𝑄1, 𝑥̃ , 𝑄3) (1,0)
Avaliando o tipo das variáveis na base_1ah
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1460 entries, 1 to 1460
Data columns (total 80 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MSSubClass 1460 non-null int64
1 MSZoning 1460 non-null object
2 LotFrontage 1201 non-null float64
3 LotArea 1460 non-null int64
4 Street 1460 non-null object
5 Alley 91 non-null object
6 LotShape 1460 non-null object
7 LandContour 1460 non-null object
8 Utilities 1460 non-null object
9 LotConfig 1460 non-null object
10 LandSlope 1460 non-null object
11 Neighborhood 1460 non-null object
12 Condition1 1460 non-null object
13 Condition2 1460 non-null object
14 BldgType 1460 non-null object
15 HouseStyle 1460 non-null object
16 OverallQual 1460 non-null int64
17 OverallCond 1460 non-null int64
18 YearBuilt 1460 non-null int64
19 YearRemodAdd 1460 non-null int64
20 RoofStyle 1460 non-null object
21 RoofMatl 1460 non-null object
22 Exterior1st 1460 non-null object
23 Exterior2nd 1460 non-null object
24 MasVnrType 1452 non-null object
25 MasVnrArea 1452 non-null float64
26 ExterQual 1460 non-null object
27 ExterCond 1460 non-null object
28 Foundation 1460 non-null object
29 BsmtQual 1423 non-null object
30 BsmtCond 1423 non-null object
31 BsmtExposure 1422 non-null object
32 BsmtFinType1 1423 non-null object
33 BsmtFinSF1 1460 non-null int64
34 BsmtFinType2 1422 non-null object
35 BsmtFinSF2 1460 non-null int64
36 BsmtUnfSF 1460 non-null int64
37 TotalBsmtSF 1460 non-null int64
38 Heating 1460 non-null object
39 HeatingQC 1460 non-null object
40 CentralAir 1460 non-null object
41 Electrical 1459 non-null object
42 1stFlrSF 1460 non-null int64
43 2ndFlrSF 1460 non-null int64
44 LowQualFinSF 1460 non-null int64
45 GrLivArea 1460 non-null int64
46 BsmtFullBath 1460 non-null int64
47 BsmtHalfBath 1460 non-null int64
48 FullBath 1460 non-null int64
49 HalfBath 1460 non-null int64
50 BedroomAbvGr 1460 non-null int64
51 KitchenAbvGr 1460 non-null int64
52 KitchenQual 1460 non-null object
53 TotRmsAbvGrd 1460 non-null int64
54 Functional 1460 non-null object
55 Fireplaces 1460 non-null int64
56 FireplaceQu 770 non-null object
57 GarageType 1379 non-null object
58 GarageYrBlt 1379 non-null float64
59 GarageFinish 1379 non-null object
60 GarageCars 1460 non-null int64
61 GarageArea 1460 non-null int64
62 GarageQual 1379 non-null object
63 GarageCond 1379 non-null object
64 PavedDrive 1460 non-null object
65 WoodDeckSF 1460 non-null int64
66 OpenPorchSF 1460 non-null int64
67 EnclosedPorch 1460 non-null int64
68 3SsnPorch 1460 non-null int64
69 ScreenPorch 1460 non-null int64
70 PoolArea 1460 non-null int64
71 PoolQC 7 non-null object
72 Fence 281 non-null object
73 MiscFeature 54 non-null object
74 MiscVal 1460 non-null int64
75 MoSold 1460 non-null int64
76 YrSold 1460 non-null int64
77 SaleType 1460 non-null object
78 SaleCondition 1460 non-null object
79 SalePrice 1460 non-null int64
dtypes: float64(3), int64(34), object(43)
memory usage: 923.9+ KB
Análise descritiva para variáveis numéricas
df.describe()
| MSSubClass | LotFrontage | LotArea | OverallQual | OverallCond | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | TotRmsAbvGrd | Fireplaces | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1460.000000 | 1201.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1452.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1379.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 |
| mean | 56.897260 | 70.049958 | 10516.828082 | 6.099315 | 5.575342 | 1971.267808 | 1984.865753 | 103.685262 | 443.639726 | 46.549315 | 567.240411 | 1057.429452 | 1162.626712 | 346.992466 | 5.844521 | 1515.463699 | 0.425342 | 0.057534 | 1.565068 | 0.382877 | 2.866438 | 1.046575 | 6.517808 | 0.613014 | 1978.506164 | 1.767123 | 472.980137 | 94.244521 | 46.660274 | 21.954110 | 3.409589 | 15.060959 | 2.758904 | 43.489041 | 6.321918 | 2007.815753 | 180921.195890 |
| std | 42.300571 | 24.284752 | 9981.264932 | 1.382997 | 1.112799 | 30.202904 | 20.645407 | 181.066207 | 456.098091 | 161.319273 | 441.866955 | 438.705324 | 386.587738 | 436.528436 | 48.623081 | 525.480383 | 0.518911 | 0.238753 | 0.550916 | 0.502885 | 0.815778 | 0.220338 | 1.625393 | 0.644666 | 24.689725 | 0.747315 | 213.804841 | 125.338794 | 66.256028 | 61.119149 | 29.317331 | 55.757415 | 40.177307 | 496.123024 | 2.703626 | 1.328095 | 79442.502883 |
| min | 20.000000 | 21.000000 | 1300.000000 | 1.000000 | 1.000000 | 1872.000000 | 1950.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 334.000000 | 0.000000 | 0.000000 | 334.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 1900.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 2006.000000 | 34900.000000 |
| 25% | 20.000000 | 59.000000 | 7553.500000 | 5.000000 | 5.000000 | 1954.000000 | 1967.000000 | 0.000000 | 0.000000 | 0.000000 | 223.000000 | 795.750000 | 882.000000 | 0.000000 | 0.000000 | 1129.500000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 2.000000 | 1.000000 | 5.000000 | 0.000000 | 1961.000000 | 1.000000 | 334.500000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5.000000 | 2007.000000 | 129975.000000 |
| 50% | 50.000000 | 69.000000 | 9478.500000 | 6.000000 | 5.000000 | 1973.000000 | 1994.000000 | 0.000000 | 383.500000 | 0.000000 | 477.500000 | 991.500000 | 1087.000000 | 0.000000 | 0.000000 | 1464.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 3.000000 | 1.000000 | 6.000000 | 1.000000 | 1980.000000 | 2.000000 | 480.000000 | 0.000000 | 25.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6.000000 | 2008.000000 | 163000.000000 |
| 75% | 70.000000 | 80.000000 | 11601.500000 | 7.000000 | 6.000000 | 2000.000000 | 2004.000000 | 166.000000 | 712.250000 | 0.000000 | 808.000000 | 1298.250000 | 1391.250000 | 728.000000 | 0.000000 | 1776.750000 | 1.000000 | 0.000000 | 2.000000 | 1.000000 | 3.000000 | 1.000000 | 7.000000 | 1.000000 | 2002.000000 | 2.000000 | 576.000000 | 168.000000 | 68.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 8.000000 | 2009.000000 | 214000.000000 |
| max | 190.000000 | 313.000000 | 215245.000000 | 10.000000 | 9.000000 | 2010.000000 | 2010.000000 | 1600.000000 | 5644.000000 | 1474.000000 | 2336.000000 | 6110.000000 | 4692.000000 | 2065.000000 | 572.000000 | 5642.000000 | 3.000000 | 2.000000 | 3.000000 | 2.000000 | 8.000000 | 3.000000 | 14.000000 | 3.000000 | 2010.000000 | 4.000000 | 1418.000000 | 857.000000 | 547.000000 | 552.000000 | 508.000000 | 480.000000 | 738.000000 | 15500.000000 | 12.000000 | 2010.000000 | 755000.000000 |
Análise descritiva das variáveis categóricas
df.describe(include=object)
| MSZoning | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinType2 | Heating | HeatingQC | CentralAir | Electrical | KitchenQual | Functional | FireplaceQu | GarageType | GarageFinish | GarageQual | GarageCond | PavedDrive | PoolQC | Fence | MiscFeature | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1460 | 1460 | 91 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1452 | 1460 | 1460 | 1460 | 1423 | 1423 | 1422 | 1423 | 1422 | 1460 | 1460 | 1460 | 1459 | 1460 | 1460 | 770 | 1379 | 1379 | 1379 | 1379 | 1460 | 7 | 281 | 54 | 1460 | 1460 |
| unique | 5 | 2 | 2 | 4 | 4 | 2 | 5 | 3 | 25 | 9 | 8 | 5 | 8 | 6 | 8 | 15 | 16 | 4 | 4 | 5 | 6 | 4 | 4 | 4 | 6 | 6 | 6 | 5 | 2 | 5 | 4 | 7 | 5 | 6 | 3 | 5 | 5 | 3 | 3 | 4 | 4 | 9 | 6 |
| top | RL | Pave | Grvl | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | Gable | CompShg | VinylSd | VinylSd | None | TA | TA | PConc | TA | TA | No | Unf | Unf | GasA | Ex | Y | SBrkr | TA | Typ | Gd | Attchd | Unf | TA | TA | Y | Gd | MnPrv | Shed | WD | Normal |
| freq | 1151 | 1454 | 50 | 925 | 1311 | 1459 | 1052 | 1382 | 225 | 1260 | 1445 | 1220 | 726 | 1141 | 1434 | 515 | 504 | 864 | 906 | 1282 | 647 | 649 | 1311 | 953 | 430 | 1256 | 1428 | 741 | 1365 | 1334 | 735 | 1360 | 380 | 870 | 605 | 1311 | 1326 | 1340 | 3 | 157 | 49 | 1267 | 1198 |
for i, x in enumerate(df.dtypes):
if x == 'object':
print(pd.crosstab(index=df[df.columns[i]], columns='freq', dropna=False))
print('')
col_0 freq
MSZoning
C (all) 10
FV 65
RH 16
RL 1151
RM 218
col_0 freq
Street
Grvl 6
Pave 1454
col_0 freq
Alley
Grvl 50
Pave 41
col_0 freq
LotShape
IR1 484
IR2 41
IR3 10
Reg 925
col_0 freq
LandContour
Bnk 63
HLS 50
Low 36
Lvl 1311
col_0 freq
Utilities
AllPub 1459
NoSeWa 1
col_0 freq
LotConfig
Corner 263
CulDSac 94
FR2 47
FR3 4
Inside 1052
col_0 freq
LandSlope
Gtl 1382
Mod 65
Sev 13
col_0 freq
Neighborhood
Blmngtn 17
Blueste 2
BrDale 16
BrkSide 58
ClearCr 28
CollgCr 150
Crawfor 51
Edwards 100
Gilbert 79
IDOTRR 37
MeadowV 17
Mitchel 49
NAmes 225
NPkVill 9
NWAmes 73
NoRidge 41
NridgHt 77
OldTown 113
SWISU 25
Sawyer 74
SawyerW 59
Somerst 86
StoneBr 25
Timber 38
Veenker 11
col_0 freq
Condition1
Artery 48
Feedr 81
Norm 1260
PosA 8
PosN 19
RRAe 11
RRAn 26
RRNe 2
RRNn 5
col_0 freq
Condition2
Artery 2
Feedr 6
Norm 1445
PosA 1
PosN 2
RRAe 1
RRAn 1
RRNn 2
col_0 freq
BldgType
1Fam 1220
2fmCon 31
Duplex 52
Twnhs 43
TwnhsE 114
col_0 freq
HouseStyle
1.5Fin 154
1.5Unf 14
1Story 726
2.5Fin 8
2.5Unf 11
2Story 445
SFoyer 37
SLvl 65
col_0 freq
RoofStyle
Flat 13
Gable 1141
Gambrel 11
Hip 286
Mansard 7
Shed 2
col_0 freq
RoofMatl
ClyTile 1
CompShg 1434
Membran 1
Metal 1
Roll 1
Tar&Grv 11
WdShake 5
WdShngl 6
col_0 freq
Exterior1st
AsbShng 20
AsphShn 1
BrkComm 2
BrkFace 50
CBlock 1
CemntBd 61
HdBoard 222
ImStucc 1
MetalSd 220
Plywood 108
Stone 2
Stucco 25
VinylSd 515
Wd Sdng 206
WdShing 26
col_0 freq
Exterior2nd
AsbShng 20
AsphShn 3
Brk Cmn 7
BrkFace 25
CBlock 1
CmentBd 60
HdBoard 207
ImStucc 10
MetalSd 214
Other 1
Plywood 142
Stone 5
Stucco 26
VinylSd 504
Wd Sdng 197
Wd Shng 38
col_0 freq
MasVnrType
BrkCmn 15
BrkFace 445
None 864
Stone 128
col_0 freq
ExterQual
Ex 52
Fa 14
Gd 488
TA 906
col_0 freq
ExterCond
Ex 3
Fa 28
Gd 146
Po 1
TA 1282
col_0 freq
Foundation
BrkTil 146
CBlock 634
PConc 647
Slab 24
Stone 6
Wood 3
col_0 freq
BsmtQual
Ex 121
Fa 35
Gd 618
TA 649
col_0 freq
BsmtCond
Fa 45
Gd 65
Po 2
TA 1311
col_0 freq
BsmtExposure
Av 221
Gd 134
Mn 114
No 953
col_0 freq
BsmtFinType1
ALQ 220
BLQ 148
GLQ 418
LwQ 74
Rec 133
Unf 430
col_0 freq
BsmtFinType2
ALQ 19
BLQ 33
GLQ 14
LwQ 46
Rec 54
Unf 1256
col_0 freq
Heating
Floor 1
GasA 1428
GasW 18
Grav 7
OthW 2
Wall 4
col_0 freq
HeatingQC
Ex 741
Fa 49
Gd 241
Po 1
TA 428
col_0 freq
CentralAir
N 95
Y 1365
col_0 freq
Electrical
FuseA 94
FuseF 27
FuseP 3
Mix 1
SBrkr 1334
col_0 freq
KitchenQual
Ex 100
Fa 39
Gd 586
TA 735
col_0 freq
Functional
Maj1 14
Maj2 5
Min1 31
Min2 34
Mod 15
Sev 1
Typ 1360
col_0 freq
FireplaceQu
Ex 24
Fa 33
Gd 380
Po 20
TA 313
col_0 freq
GarageType
2Types 6
Attchd 870
Basment 19
BuiltIn 88
CarPort 9
Detchd 387
col_0 freq
GarageFinish
Fin 352
RFn 422
Unf 605
col_0 freq
GarageQual
Ex 3
Fa 48
Gd 14
Po 3
TA 1311
col_0 freq
GarageCond
Ex 2
Fa 35
Gd 9
Po 7
TA 1326
col_0 freq
PavedDrive
N 90
P 30
Y 1340
col_0 freq
PoolQC
Ex 2
Fa 2
Gd 3
col_0 freq
Fence
GdPrv 59
GdWo 54
MnPrv 157
MnWw 11
col_0 freq
MiscFeature
Gar2 2
Othr 2
Shed 49
TenC 1
col_0 freq
SaleType
COD 43
CWD 4
Con 2
ConLD 9
ConLI 5
ConLw 5
New 122
Oth 3
WD 1267
col_0 freq
SaleCondition
Abnorml 101
AdjLand 4
Alloca 12
Family 20
Normal 1198
Partial 125
1.2 Gráficos como: Gráficos de colunas, BoxPlot, dispersão (2,0)
Para gráficos numéricos plotamos boxplot e para categóricos gráficos de barra.
for i, x in enumerate(df.dtypes):
if x == 'int64' or x == 'float64':
plt.figure(i)
seaborn.boxplot(y = df[df.columns[i]])
elif x == 'int64' or x == 'object':
plt.figure(i)
seaborn.barplot(x = df[df.columns[i]], y = range(0,len(df)) )
/Users/karinseeder/anaconda3/lib/python3.7/site-
packages/matplotlib/pyplot.py:514: RuntimeWarning: More than 20
figures have been opened. Figures created through the pyplot interface
(`matplotlib.pyplot.figure`) are retained until explicitly closed and
may consume too much memory. (To control this warning, see the rcParam
`figure.max_open_warning`).
max_open_warning, RuntimeWarning)
from pandas.plotting import scatter_matrix
scatter_matrix(df, figsize=(16,16))
array([[<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1b97c8390>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1aacbdf28>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1aac73f28>,
...,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ab2b1470>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ab2d99e8>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ab302f60>],
[<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ab332518>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ab358a90>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ab35f048>,
...,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1aba09208>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1aba31780>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1aba59cf8>],
[<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1aba882b0>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1abab0828>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1abad8da0>,
...,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1d5b62278>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1d5ae3e48>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1a93b4390>],
...,
[<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1b9583278>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1b95ac7f0>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1b9703d68>,
...,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ba540f28>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ba5714e0>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ba598a58>],
[<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ba5bffd0>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ba5f1588>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1ba61ab00>,
...,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1baca6cc0>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1bacd6278>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1bacff7f0>],
[<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1bae26d68>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1bae59320>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1bae81898>,
...,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1bb563a58>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1bb58dfd0>,
<matplotlib.axes._subplots.AxesSubplot object at
0x7fb1bb5bd588>]],
dtype=object)
2 - Análise de correlações (𝑟𝑥𝑖,𝑥𝑗) 2.1 Correlograma (1,5) 2.2 Análise sobre correlações significativas (1,5) 3 - Desenvolvimento de modelo de Regressão utilizando Regressão Linear com o método de mínimos quadrados ordinários. Apresente as características do desenvolvimento: amostras, medidas de avaliação do modelo... 3.1 Determinação do tamanho da amostra de teste (0,5) 3.2 Medidas de desempenho para amostra de treino e para amostra de teste: 𝑅2, 𝑅𝑀𝑆𝐸, 𝑀𝐴𝐸 (2,0) 3.3 Descrição do modelo final, discussão sobre a significância obtida para os coeficientes, análise de resíduos (1,5)